Introspection by frodesundby · Pull Request #385 · nais/api

frodesundby · 2026-04-09T07:32:45Z

No description provided.

Copilot

Pull request overview

Adds “instance group” introspection to the GraphQL API so clients can inspect ReplicaSet-backed rollout revisions, including env vars, mounted files, instances (pods), and recent K8s events.

Changes:

Introduces instancegroup workload package (models, queries, event translation, dataloader + ReplicaSet watcher).
Extends GraphQL schema and resolvers to expose Application.instanceGroups and InstanceGroup details.
Adjusts Pod transformer to retain metadata.ownerReferences so pods can be matched to ReplicaSets.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
internal/workload/instancegroup/queries.go	Core listing + introspection logic for env vars, mounts, and instance (pod) membership
internal/workload/instancegroup/events.go	Fetches and translates K8s events for ReplicaSets/pods into user-friendly messages
internal/workload/instancegroup/models.go	Defines `InstanceGroup` and related GraphQL-facing models/enums
internal/workload/instancegroup/dataloader.go	Adds loaders + ReplicaSet transformer + environment->dynamic client lookup
internal/workload/instancegroup/node.go	Registers InstanceGroup as a `Node` ident type
internal/workload/dataloader.go	Keeps Pod `ownerReferences` in cached pods to enable ReplicaSet ownership matching
internal/kubernetes/watchers/watchers.go	Wires a new ReplicaSet watcher into the global watcher set
internal/graph/schema/instancegroup.graphqls	Adds GraphQL types/fields for instance groups and related types
internal/graph/instancegroup.resolvers.go	Implements resolvers for `instanceGroups`, env vars, mounts, instances, events
internal/cmd/api/http.go	Injects instancegroup loader context into Graph request pipeline
.configs/gqlgen.yaml	Adds `instancegroup` package to gqlgen autobind
internal/graph/gengql/applications.generated.go	gqlgen output wiring `Application.instanceGroups`
internal/graph/gengql/instancegroup.generated.go	gqlgen output for InstanceGroup types/resolvers
internal/graph/gengql/schema.generated.go	gqlgen output: Node type switch + schema inclusion
internal/graph/gengql/root_.generated.go	gqlgen output: resolver root + complexity hooks
internal/graph/gengql/teams.generated.go	gqlgen output: Application field context updates
internal/graph/gengql/issues.generated.go	gqlgen output: Application field context updates

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T07:25:25Z

internal/workload/instancegroup/queries.go

+		case env.ValueFrom != nil && env.ValueFrom.ConfigMapKeyRef != nil:
+			value := env.Value // will be empty for valueFrom, but we try
+			ev.Value = &value
+			ev.Source = InstanceGroupValueSource{
+				Kind: InstanceGroupValueSourceKindConfigMap,
+				Name: env.ValueFrom.ConfigMapKeyRef.LocalObjectReference.Name,
+			}
+		case env.ValueFrom != nil && env.ValueFrom.FieldRef != nil:


For env vars using ValueFrom.ConfigMapKeyRef, this sets Value to env.Value (which will be empty) and never resolves the referenced key from the ConfigMap. This contradicts the function doc that says ConfigMap variables include values. Suggest querying the ConfigMap and reading ConfigMapKeyRef.Key (honor Optional), or set Value=nil and make the doc accurate.

Fixed in 64d1349 — ConfigMapKeyRef env vars now resolve their value via getConfigMapData() (K8s API call). Honors the Optional flag (debug log for optional, warn for required). Value is nil if the lookup fails.

Copilot · 2026-04-14T07:25:25Z

internal/workload/instancegroup/queries.go

+	if mount.SubPath != "" {
+		// Fetch content for the specific key
+		contents, err := getConfigMapFileContents(ctx, l, ig.EnvironmentName, ig.TeamSlug.String(), cmName)
+		if err != nil {
+			l.log.WithError(err).WithField("configmap", cmName).Warn("failed to resolve configmap content for subPath mount")
+			errMsg := fmt.Sprintf("ConfigMap '%s' could not be found or accessed. The application may fail to start until this is resolved.", cmName)
+			return []*InstanceGroupMountedFile{{
+				Path:   mount.MountPath,
+				Source: source,
+				Error:  &errMsg,
+			}}
+		}
+		f := &InstanceGroupMountedFile{
+			Path:   mount.MountPath,
+			Source: source,
+		}
+		if fc, ok := contents[mount.SubPath]; ok {
+			f.Content = &fc.content
+			f.IsBinary = fc.isBinary
+		}


ConfigMap volume mounts with VolumeMount.SubPath are resolved using contents[mount.SubPath], but SubPath refers to the file path within the volume, not necessarily the ConfigMap key (it may match items[].Path when Items are specified). This can silently drop content for subPath mounts. Suggest mapping SubPath -> Key via the provided items (when present), otherwise fall back to using SubPath as the key.

Fixed in 64d1349 — when SubPath is set and items are specified, we now map SubPath back to the actual ConfigMap key via the items list (item.Path → item.Key). Without items, SubPath is used directly as the key (unchanged).

Copilot · 2026-04-14T07:25:25Z

internal/workload/instancegroup/events.go

+
+	// Fall back to metadata.creationTimestamp
+	if ts, ok, _ := unstructured.NestedString(obj.Object, "metadata", "creationTimestamp"); ok && ts != "" {
+		if t, err := time.Parse(time.RFC3339, ts); err == nil {


metadata.creationTimestamp is parsed with time.RFC3339, but Kubernetes timestamps often include fractional seconds. time.Parse(time.RFC3339, ts) will fail in that case, causing extractEventTimestamp to return zero and dropping events. Suggest parsing with time.RFC3339Nano (which accepts both with/without fractions) or trying both formats.

Suggested change

if t, err := time.Parse(time.RFC3339, ts); err == nil {

if t, err := time.Parse(time.RFC3339Nano, ts); err == nil {

Fixed in 64d1349 — changed to time.RFC3339Nano which accepts both with and without fractional seconds.

Copilot · 2026-04-14T07:25:26Z

internal/workload/instancegroup/events.go

+	// Fetch events for each pod in the instance group
+	instances, err := ListInstances(ctx, ig)
+	if err != nil {
+		l.log.WithError(err).Warn("failed to list instances for event fetching")
+	} else {
+		for _, inst := range instances {
+			podEvents, err := fetchEvents(ctx, client, namespace, "Pod", inst.Name)
+			if err != nil {
+				l.log.WithError(err).WithField("pod", inst.Name).Warn("failed to fetch pod events")
+				continue
+			}
+			podName := inst.Name
+			for i := range podEvents {
+				if ev := translateEvent(&podEvents[i], &podName); ev != nil {
+					allEvents = append(allEvents, ev)
+				}
+			}
+		}


ListEvents issues one Kubernetes List call per pod instance (plus one for the ReplicaSet). For workloads with many replicas this becomes N+1 API calls and can add significant latency/load to the API server when the GraphQL field is queried. Consider adding server-side limits (Limit/Continue, time window), batching via a single namespace event list + in-memory filtering, and/or caching events per request.

Acknowledged — for typical workloads (2-5 pods) this is acceptable. Events are fetched on-demand and K8s events have ~1 hour TTL. Can be optimized later with namespace-scoped list + in-memory filtering if needed.

Copilot · 2026-04-14T07:25:26Z

internal/workload/instancegroup/dataloader.go

+// transformReplicaSet strips unnecessary fields from ReplicaSets but keeps
+// the data we need: labels, annotations (revision), replicas, pod template
+// (containers with env/envFrom/volumeMounts/image, volumes).
+func transformReplicaSet(in any) (any, error) {
+	rs := in.(*unstructured.Unstructured)
+
+	// --- Extract data we want to keep ---
+
+	// metadata.annotations - keep only deployment.kubernetes.io/revision
+	annotations := rs.GetAnnotations()
+	newAnnotations := map[string]string{}
+	if v, ok := annotations["deployment.kubernetes.io/revision"]; ok {
+		newAnnotations["deployment.kubernetes.io/revision"] = v
+	}
+
+	// metadata.labels - keep app and team
+	labelsToKeep := []string{"app", "team"}
+	labels := rs.GetLabels()
+	for k := range labels {
+		if !slices.Contains(labelsToKeep, k) {
+			delete(labels, k)
+		}
+	}
+
+	// spec.replicas
+	replicas, _, _ := unstructured.NestedInt64(rs.Object, "spec", "replicas")
+
+	// spec.template.spec.containers (keep name, image, env, envFrom, volumeMounts)
+	containers, _, _ := unstructured.NestedSlice(rs.Object, "spec", "template", "spec", "containers")
+	newContainers := make([]any, 0, len(containers))
+	for _, container := range containers {
+		c, ok := container.(map[string]any)
+		if !ok {
+			return nil, fmt.Errorf("container is not a map[string]any")
+		}
+		newC := map[string]any{
+			"name": c["name"],
+		}
+		if img, ok := c["image"]; ok {
+			newC["image"] = img
+		}
+		if env, ok := c["env"]; ok {
+			newC["env"] = env
+		}
+		if envFrom, ok := c["envFrom"]; ok {
+			newC["envFrom"] = envFrom
+		}
+		if vm, ok := c["volumeMounts"]; ok {
+			newC["volumeMounts"] = vm
+		}
+		newContainers = append(newContainers, newC)
+	}
+
+	// spec.template.spec.volumes
+	volumes, _, _ := unstructured.NestedSlice(rs.Object, "spec", "template", "spec", "volumes")
+
+	// status.replicas and status.readyReplicas
+	statusReplicas, _, _ := unstructured.NestedInt64(rs.Object, "status", "replicas")
+	readyReplicas, _, _ := unstructured.NestedInt64(rs.Object, "status", "readyReplicas")
+
+	// spec.template.metadata.labels (for pod matching)
+	templateLabels, _, _ := unstructured.NestedStringMap(rs.Object, "spec", "template", "metadata", "labels")
+
+	// metadata.ownerReferences (to link to Deployment/Application)
+	ownerRefs, _, _ := unstructured.NestedSlice(rs.Object, "metadata", "ownerReferences")
+
+	// --- Remove everything ---
+	fieldsToRemove := [][]string{
+		{"spec"},
+		{"status"},
+		{"metadata", "managedFields"},
+		{"metadata", "generateName"},
+	}
+	for _, field := range fieldsToRemove {
+		unstructured.RemoveNestedField(rs.Object, field...)
+	}
+
+	// --- Add back only what we need ---
+	rs.SetLabels(labels)
+	rs.SetAnnotations(newAnnotations)
+
+	_ = unstructured.SetNestedField(rs.Object, replicas, "spec", "replicas")
+	_ = unstructured.SetNestedSlice(rs.Object, newContainers, "spec", "template", "spec", "containers")
+	if len(volumes) > 0 {
+		_ = unstructured.SetNestedSlice(rs.Object, volumes, "spec", "template", "spec", "volumes")
+	}
+	if len(templateLabels) > 0 {
+		templateLabelsAny := make(map[string]any, len(templateLabels))
+		for k, v := range templateLabels {
+			templateLabelsAny[k] = v
+		}
+		_ = unstructured.SetNestedMap(rs.Object, templateLabelsAny, "spec", "template", "metadata", "labels")
+	}
+	if len(ownerRefs) > 0 {
+		_ = unstructured.SetNestedSlice(rs.Object, ownerRefs, "metadata", "ownerReferences")
+	}
+
+	_ = unstructured.SetNestedField(rs.Object, statusReplicas, "status", "replicas")
+	_ = unstructured.SetNestedField(rs.Object, readyReplicas, "status", "readyReplicas")
+
+	return rs, nil
+}


This new transformer is fairly involved and is expected to run in informer WatchList paths where transforms can be applied more than once. There are tests for transformer idempotency in other workload packages (e.g. secret). Consider adding unit tests verifying transformReplicaSet is idempotent and preserves the fields needed by queries (revision, replicas, containers env/envFrom/volumeMounts, volumes, ownerRefs).

Acknowledged — transformer tests are on the backlog. Will add idempotency tests in a follow-up.

… value resolution

Lua integration tests covering: - Application with instance groups (ReplicaSet filtering) - Instance group with instances (Pod listing) - Instance group with environment variables (env, configMapKeyRef, secretKeyRef, envFrom) - Instance group with mounted files (ConfigMap/Secret volume mounts) Also includes dataloader transformer idempotency unit test.

Include image name, container name, pod name, and rescale size in translated event messages instead of dropping them. Also remove nais.yaml tips from scheduling failure messages.

…ss, and last termination info Extend ApplicationInstanceStatus with ready, lastExitReason, and lastExitCode fields to give users actionable insight into instance health. Replace raw Kubernetes reason strings with human-readable messages. Add TERMINATED state for containers that exited successfully. Fix State() to use classifyWaiting so benign startup states no longer incorrectly report Application as NOT_RUNNING.

…atterns - Replace benignWaiting map[string]struct{} with slices.Contains - Separate benign and problem message maps, remove redundant switch - Replace sort.Slice with slices.SortFunc using time.Compare - Replace sort.Strings with slices.Sorted(maps.Keys(...)) - Replace IsValid() switch statements with slices.Contains

nais-team-app bot added the graphql-review-required label Apr 9, 2026

frodesundby force-pushed the introspection branch from 388cc0e to e50a263 Compare April 14, 2026 06:34

frodesundby assigned frodesundby and Copilot and unassigned frodesundby and Copilot Apr 14, 2026

frodesundby requested a review from Copilot April 14, 2026 07:19

Copilot started reviewing on behalf of frodesundby April 14, 2026 07:20 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

frodesundby added 11 commits April 14, 2026 12:27

add instance instrospection functionality

a4f3379

add instance instrospection functionality

eec1e21

introspection

fab2948

add events handling

821ae9b

add encoding and content to instance group files

a5eacd1

add error field to mounted files for mount failure reporting

5508ed6

include secret/configmap name in mount failure event messages

19a52b5

quote not accentuate

4cdcb52

fix timestamp parsing, SubPath/items key mapping, and ConfigMapKeyRef…

f49b58d

… value resolution

preserve specific details in event translations

a4382d4

Include image name, container name, pod name, and rescale size in translated event messages instead of dropping them. Also remove nais.yaml tips from scheduling failure messages.

frodesundby force-pushed the introspection branch from 8519e42 to a4382d4 Compare April 14, 2026 10:27

frodesundby added 5 commits April 14, 2026 13:38

events handling

fd15bcb

include specific K8s details in container waiting status messages

03f7dad

simplify CreateContainerConfigError message wording

cfcc6c9

frodesundby marked this pull request as ready for review April 14, 2026 13:58

frodesundby requested a review from a team as a code owner April 14, 2026 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introspection#385

Introspection#385
frodesundby wants to merge 16 commits intomainfrom
introspection

frodesundby commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

frodesundby Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

frodesundby Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

frodesundby Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

frodesundby Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

frodesundby Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	if t, err := time.Parse(time.RFC3339, ts); err == nil {
	if t, err := time.Parse(time.RFC3339Nano, ts); err == nil {

Conversation

frodesundby commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

frodesundby Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

frodesundby Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

frodesundby Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

frodesundby Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

frodesundby Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants